17 research outputs found

    Consistent distribution-free KK-sample and independence tests for univariate random variables

    Full text link
    A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example.Comment: arXiv admin note: substantial text overlap with arXiv:1308.155

    Modular ‘Click-in-Emulsion’ Bone-Targeted Nanogels

    Get PDF
    A new class of nanogel demonstrates modular biodistribution and affinity for bone. Nanogels, ~70 nm in diameter and synthesized via an astoichiometric click-chemistry in-emulsion method, controllably display residual, free clickable functional groups. Functionalization with a bisphosphonate ligand results in significant binding to bone on the inner walls of marrow cavities, liver avoidance, and anti-osteoporotic effects.National Institutes of Health (U.S.) (RO1 DE016516)National Institutes of Health (U.S.) (R01 EB000244)Damon Runyon Cancer Research Foundation (DFS-#2050-10

    Function of Cancer Associated Genes Revealed by Modern Univariate and Multivariate Association Tests

    No full text
    <div><p>Copy number variation (CNV) plays a role in pathogenesis of many human diseases, especially cancer. Several whole genome CNV association studies have been performed for the purpose of identifying cancer associated CNVs. Here we undertook a novel approach to whole genome CNV analysis, with the goal being identification of associations between CNV of different genes (CNV-CNV) across 60 human cancer cell lines. We hypothesize that these associations point to the roles of the associated genes in cancer, and can be indicators of their position in gene networks of cancer-driving processes. Recent studies show that gene associations are often non-linear and non-monotone. In order to obtain a more complete picture of all CNV associations, we performed omnibus univariate analysis by utilizing dCov, MIC, and HHG association tests, which are capable of detecting any type of association, including non-monotone relationships. For comparison we used Spearman and Pearson association tests, which detect only linear or monotone relationships. Application of dCov, MIC and HHG tests resulted in identification of twice as many associations compared to those found by Spearman and Pearson alone. Interestingly, most of the new associations were detected by the HHG test. Next, we utilized dCov's and HHG's ability to perform multivariate analysis. We tested for association between genes of unknown function and known cancer-related pathways. Our results indicate that multivariate analysis is much more effective than univariate analysis for the purpose of ascribing biological roles to genes of unknown function. We conclude that a combination of multivariate and univariate omnibus association tests can reveal significant information about gene networks of disease-driving processes. These methods can be applied to any large gene or pathway dataset, allowing more comprehensive analysis of biological processes.</p></div

    Example of significant relationships.

    No full text
    <p>First line consists of three findings discovered only by Spearman or Pearson; second, only by HHG; third, only by dCov; and fourth, only by MIC. P-values (after adjusting for multiple testing) are denoted in each plot.</p

    Bipartite graph displaying gene-to-pathway associations, as determined by HHG and dCov.

    No full text
    <p>In panels A and B, genes (on the left) and pathways (on the right) were analyzed for association by HHG and dCov. Significant associations (after adjusting for multiple testing) are linked by lines: dashed for HHG, dotted for dCov, and solid for both. A) Significant associations between genes with unknown function and cancer related pathways. Associations found by dCov and HHG are marked. B) Significant associations between genes with known function and cancer related pathways. Only associations found by dCov are shown as no significant associations were found by HHG.</p

    Euler diagram of the significant discoveries found by Pearson or Spearman, dCov and HHG.

    No full text
    <p>MIC was excluded due to the small number of significant findings provided by this method. The area of each oval represents the number of significant tests of each method, and intersections (emphasized by different colors) represent common discoveries. Evidently, Pearson or Spearman, dCov and HHG share 185 discoveries; 184 tests were significant by HHG but not by Pearson, Spearman or dCov; 10 tests were significant by dCov and not by Pearson, Spearman or HHG; 29 tests were significant by Pearson or Spearman but not by dCov or HHG; dCov and HHG share 26 discoveries; Pearson or Spearman and dCov share 35 discoveries; and Pearson or Spearman and HHG share only 5 discoveries.</p
    corecore